Skip to content

Go: add experimental query for IDNA digit-fold IP-literal smuggling#21784

Open
astrogilda wants to merge 3 commits intogithub:mainfrom
astrogilda:experimental-go-idna-ip-literal-smuggle
Open

Go: add experimental query for IDNA digit-fold IP-literal smuggling#21784
astrogilda wants to merge 3 commits intogithub:mainfrom
astrogilda:experimental-go-idna-ip-literal-smuggle

Conversation

@astrogilda
Copy link
Copy Markdown

@astrogilda astrogilda commented May 4, 2026

Summary

  • Adds a new experimental Go query go/idna-ip-literal-smuggle under go/ql/src/experimental/CWE-918/ that detects the UTS-46 IDNA digit-fold IP-literal smuggling anti-pattern.
  • Stateful taint configuration (TaintTracking::GlobalWithState) with two flow states (TPreIdna, TPostIdna) and a barrier that requires a trailing-dot trim followed by net.ParseIP / netip.ParseAddr to clear post-IDNA flow.
  • Inline-expectation tests with 23 unique sink alerts on the positive fixture and 1 intended-positive on the adversarial-witness binding case; 0 alerts on the negative fixture.

Motivation

golang.org/x/net/idna applies UTS-46 NFKC mapping inside (*Profile).ToASCII and (*Profile).ToUnicode for the Lookup, Display, Registration, and MapForLookup-derived profiles. The mapping folds 100 distinct non-ASCII Unicode digit codepoints partitioned into seven Unicode-block ranges (Latin-1 superscripts, mathematical superscripts, mathematical subscripts, circled digits, fullwidth digits, the Mathematical Alphanumeric Symbols block covering bold, double-struck, sans-serif, sans-serif-bold, and monospace digit styles, and segmented digits) to their ASCII equivalents. Devanagari digits (U+0966..U+096F) are not in scope: empirically verified against golang.org/x/net/idna v0.53.0, they pass through Punycode rather than fold to ASCII. The library contains no IP-literal detection, so a caller that runs net.ParseIP only before idna.ToASCII will accept a smuggled IPv4 literal such as "0.¹.0.0" (which maps to "0.1.0.0") into a downstream network sink that assumed the value was a domain name.

Single-state taint tracking is structurally insufficient: a pre-IDNA net.ParseIP barrier must NOT block flow that transitions through the IDNA call. The query therefore uses two flow states. Sources start in TPreIdna. The IDNA mapping call is modeled as a state-transition step that flips TPreIdna -> TPostIdna. Sinks are flagged only in TPostIdna. The barrier is net.ParseIP / net.ParseCIDR / netip.ParseAddr / netip.ParsePrefix, but only when the value reaching the parser was produced by a trailing-dot trim (strings.TrimRight(_, "."), strings.TrimSuffix(_, "."), or the manual if strings.HasSuffix(out, ".") { out = out[:len(out)-1] } slice form) of the IDNA output. The trim is required because "0.¹.0.0." maps to "0.1.0.0.", which net.ParseIP rejects on its own yet is still an IP literal for routing purposes.

The detection-strategy design choice (CodeQL inter-procedural taint as the primary recall vehicle, with Semgrep OSS as a high-precision direct-call secondary sweep) is documented in the upstream rules repo at docs/research/v0.1-detection-strategy.md. The short version: CodeQL TaintTracking::GlobalWithState is inter-procedural by default and propagates through one-deep wrappers like idnaASCII without an explicit isAdditionalFlowStep. That matters because the only two production callsites of any UTS-46-mapping profile across golang/go, kubernetes/kubernetes, and prometheus/prometheus (a 660 MB sweep) are wrapped by such a helper. A blanket structural rule of the form "ToASCII without recheck in same block" was considered and rejected: an empirical sweep across 31 production callsites in 19 repositories shows the rule would fire on PSL walkers, registrar pipelines, and TLS-SNI normalization sites where the mapped value never reaches a network-routing decision driven by attacker input, and the false-positive density is too high to ship as a default.

Scope is IPv4 only. IPv6 colons are rejected by IDNA rune-validation before UTS-46 mapping runs, so no IPv6 smuggle path exists.

Verification

Compile and unit-test commands (run from a working tree of github/codeql, with the staged files in place):

codeql query compile go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql
codeql test run     go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/

Local results from CodeQL CLI 2.25.3:

  • query compile: clean (1/1 success, 17 s).
  • test run: positive fixture produces 23 unique sink alerts plus 1 intended-positive on the adversarial-witness binding case; negative fixture produces 0 alerts. Inline-expectations match.

The positive fixture exercises the canonical wrapper shape modeled on golang.org/x/net/http/httpproxy.canonicalAddr (a one-deep idnaASCII helper consumed by both net.JoinHostPort and net.Dial-class sinks). The query fires twice on this shape, once per sink. That is the empirical evidence that TaintTracking::GlobalWithState propagates through the wrapper with no additional-flow-step model required.

The negative fixture covers the safe pattern (post-IDNA strings.TrimRight / strings.TrimSuffix / manual HasSuffix slice followed by net.ParseIP or netip.ParseAddr), the Punycode profile (which has a nil mapping function and is correctly excluded from the idnaMappingCall predicate), and unrelated hostname use that never transits IDNA.

Disposition by the upstream Go security team

The Go security team reviewed the underlying digit-fold behavior and declined to treat it as a library bug on 2026-04-29: their position is that UTS-46 is correctly implemented and the recheck obligation belongs to the caller. This query implements the caller-side detection that follows from that disposition. It is not a substitute for a library fix; it is the alert that surfaces the contract on the call site where the recheck is missing.

Source artefacts

The query, library, qhelp, examples, and test fixtures originate from the upstream rules repository astrogilda/idna-ip-literal-smuggle-rules (codeql/ subdirectory). The upstream repo also carries the corpus-eval scripts, the v0.1.x detection-strategy synthesis under docs/research/, and complementary Semgrep and gopatch rules for the same anti-pattern.

CLA

I will sign the CLA on first PR comment per the registry workflow.

References

Cross-language analogs of the same caller-side IP-literal-validation gap that this query addresses for Go IDNA:

  • CVE-2021-29923: Go net.ParseIP accepted IPv4 octets with leading zeros, allowing SSRF allowlist bypass via octal interpretation downstream.
  • CVE-2024-12224: Caller-side IP-literal validation gap producing comparable allowlist bypass behavior.
  • CVE-2024-3651: idna (Python) UTS-46 / Punycode handling input that drove disproportionate parser cost; same library family, different failure mode but illustrates the recurring pattern of UTS-46 input requiring caller-side validation that the library does not provide.

Adds go/idna-ip-literal-smuggle under go/ql/src/experimental/CWE-918/.

The query detects callers that pass an untrusted hostname through
golang.org/x/net/idna (UTS-46 NFKC mapping folds 100 non-ASCII Unicode
digit codepoints to ASCII) into a network sink without rechecking the
post-IDNA value as an IP literal after a trailing-dot trim. Without
the recheck, an input such as "0.[U+00B9].0.0" maps to "0.1.0.0" and
slips past any pre-IDNA net.ParseIP guard, smuggling an IPv4 literal
into net.JoinHostPort, net.Dial, http.Request.URL.Host,
tls.Config.ServerName, http.Cookie.Domain, and the DNS resolver
entry points.

Implementation uses TaintTracking::GlobalWithState with two flow
states (TPreIdna, TPostIdna). The IDNA mapping call is a state
transition step. The barrier is a trailing-dot trim followed by
net.ParseIP / net.ParseCIDR / netip.ParseAddr / netip.ParsePrefix in
TPostIdna; a bare ParseIP without the prior trim does not sanitize
because "0.1.0.0." is rejected by ParseIP yet remains a valid IP
literal for routing. IPv6 is out of scope because the colon is a
UTS-46 disallowed rune.

Tests: 23 unique sink alerts on the positive fixture (which includes
the canonical canonicalAddr-shape wrapper), no false positives on the
negative fixture (safe pattern, Punycode profile, non-IDNA hostname
use). codeql test run reports 1 of 1 PASSED.
@astrogilda astrogilda requested a review from a team as a code owner May 4, 2026 06:43
Copilot AI review requested due to automatic review settings May 4, 2026 06:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new experimental Go security query for detecting IDNA UTS-46 digit-fold cases where untrusted hostnames become IPv4 literals after IDNA normalization and then flow into network-relevant sinks without a post-IDNA IP recheck.

Changes:

  • Adds a new stateful taint-tracking library/query under go/ql/src/experimental/CWE-918/ plus qhelp and example code.
  • Adds positive and negative inline-expectation fixtures covering sinks, barriers, and a witness-binding regression case.
  • Adds a dedicated Go test module and expected output for the new experimental query.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/positives.go Positive inline-expectation cases for vulnerable IDNA-to-sink flows.
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/negatives.go Negative and regression fixtures for safe post-IDNA rechecks and non-sink cases.
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/go.mod Standalone module/dependency setup for the new Go test fixture.
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.qlref Connects the test directory to the experimental query and inline postprocessors.
go/ql/test/experimental/CWE-918/IdnaIpLiteralSmuggle/IdnaIpLiteralSmuggle.expected Generated expected output baseline for the inline-expectation test run.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggleGood.go Safe example used by the query help documentation.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggleBad.go Vulnerable example used by the query help documentation.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qll Core stateful taint-tracking model for IDNA transitions, barriers, and sinks.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.ql Query entry point and result metadata for the new path-problem query.
go/ql/src/experimental/CWE-918/IdnaIpLiteralSmuggle.qhelp End-user documentation, remediation guidance, and examples for the query.

Comment on lines +106 to +118
// Compliant: post-IDNA TrimSuffix + netip.ParseAddr recheck before (*Resolver).LookupIPAddr.
func compliantResolverLookupIPAddr(req *http.Request) {
host := req.Header.Get("X-HOST-IPADDR-OK")
ace, err := idna.Lookup.ToASCII(host)
if err != nil {
return
}
candidate := strings.TrimSuffix(ace, ".")
if _, parseErr := netip.ParseAddr(candidate); parseErr == nil {
return
}
r := &net.Resolver{}
r.LookupIPAddr(context.Background(), ace) // OK: post-IDNA recheck barrier
Comment on lines +163 to +181
// --- Class 8: Devanagari digits (U+0966..U+096F) ---
// UTS-46 NFKC folds Devanagari digits to ASCII equivalents.
// "१.0.0.0" (U+0967 DEVANAGARI ONE) -> "1.0.0.0"
func smuggleDevanagariDigit(req *http.Request) {
host := req.Header.Get("X-HOST-DEVANAGARI") // $ Source
ace, _ := idna.Lookup.ToASCII(host)
http.Get("https://" + ace + "/") // $ Alert
}

// --- Class 8 second positive: Devanagari digit U+0969 -> "3",
// (*net.Resolver).LookupIPAddr sink ---
// "१.0.0.३" -> "1.0.0.3"
func smuggleDevanagariResolverLookupIPAddr(req *http.Request) {
host := req.Header.Get("X-HOST-DEVANAGARI-TWO") // $ Source
ace, _ := idna.Lookup.ToASCII(host)
r := &net.Resolver{}
r.LookupIPAddr(context.Background(), ace) // $ Alert
}

Comment on lines +31 to +32
* `(*idna.Profile).ToASCII` (and the package-level `idna.ToASCII`,
* `Lookup.ToASCII`, `MapForLookup().ToASCII`) is modeled as a
ace, _ := idna.Punycode.ToASCII(host)
http.Get("https://" + ace + "/") // OK: no digit-fold profile
}

Comment on lines +72 to +73
Vulnerable pattern. <code>net.ParseIP</code> is called only before
<code>idna.ToASCII</code>, so the smuggled literal slips through:
Comment on lines +11 to +22
that folds <strong>100 distinct non-ASCII Unicode digit codepoints</strong>
across 8 families to their ASCII equivalents. The 8 families are:
</p>
<ul>
<li>Latin-1 superscripts (U+00B2, U+00B3, U+00B9): 3 codepoints</li>
<li>Mathematical superscripts (U+2070, U+2074..U+2079): 7 codepoints</li>
<li>Mathematical subscripts (U+2080..U+2089): 10 codepoints</li>
<li>Circled digits (U+2460..U+2468, U+24EA): 10 codepoints</li>
<li>Fullwidth digits (U+FF10..U+FF19): 10 codepoints</li>
<li>Mathematical bold, sans-serif, double-struck, and monospace digits
(U+1D7CE..U+1D7FF): 50 codepoints</li>
<li>Segmented digits (U+1FBF0..U+1FBF9): 10 codepoints</li>
Comment on lines +61 to +65
single trailing dot and call <code>net.ParseIP</code> (or
<code>netip.ParseAddr</code>) on the result, then reject on non-nil. The
trailing-dot trim is required because <code>"0.¹.0.0."</code> maps to
<code>"0.1.0.0."</code>, which <code>net.ParseIP</code> rejects on its
own yet is still an IP literal for routing purposes.
Comment on lines +86 to +97
The safe pattern accepts three equivalent trailing-dot trim forms:
</p>
<ul>
<li><code>strings.TrimRight(ace, ".")</code>: multi-dot form. Handles
the fullwidth and ideographic dot variants that produce multiple
trailing ASCII dots after UTS-46 mapping.</li>
<li><code>strings.TrimSuffix(ace, ".")</code>: single-dot form.
Sufficient for most inputs but incomplete for the multi-dot
variant.</li>
<li><code>if strings.HasSuffix(ace, ".") { ace = ace[:len(ace)-1] }</code>:
manual slice form. Equivalent to <code>TrimSuffix</code> in
effect.</li>
Comment on lines +51 to +56
// True-negative: caller uses idna.Display for human rendering only; the
// output never reaches a network sink in this function.
func displayOnly(req *http.Request) {
host := req.Header.Get("X-HOST-DISPLAY")
disp, _ := idna.Display.ToUnicode(host)
_ = disp // OK: never reaches a sink
Comment on lines +3 to +10
* @description An untrusted hostname flows through `golang.org/x/net/idna`
* mapping (which folds 100 non-ASCII Unicode digit codepoints to
* ASCII via UTS-46 NFKC) and reaches a security-relevant
* hostname sink without a post-IDNA IP-literal recheck. A
* caller that calls `net.ParseIP` only BEFORE `idna.ToASCII`
* will accept a smuggled IPv4 literal such as `"0.¹.0.0"`
* (which maps to `"0.1.0.0"`). Scope is IPv4 only because
* IPv6 colons are rejected by IDNA rune-validation before
astrogilda added 2 commits May 4, 2026 11:08
Fix several drift issues between the prose and the predicates:

- qhelp now lists the seven Unicode-block ranges that account for the
  100 fold codepoints, replacing the prior "8 families" claim that did
  not match the bullet list. Adds an explicit note that Devanagari
  digits do not fold (verified empirically against
  golang.org/x/net/idna v0.53.0) and that Registration disallows every
  fold codepoint at rune validation.
- qhelp recommendation corrects netip.ParseAddr semantics (parsed via
  err == nil, not via a non-nil return) and rewrites the trim-form
  list to state that TrimSuffix and the manual slice are not
  equivalent to TrimRight for multi-trailing-dot inputs.
- qhelp example caption matches the actual sample, which shows the
  no-recheck shape, not a pre-IDNA ParseIP shape.
- qll module docstring now covers both ToASCII and ToUnicode on the
  digit-folding profiles, lists the actual idna.New(MapForLookup)
  construction shape, and explicitly states that the package-level
  idna.ToASCII helper and the Punycode profile are excluded.
- ql @description and select message reflect that the model covers
  both ToASCII and ToUnicode and that the no-recheck case is the
  primary anti-pattern, not just the pre-IDNA-ParseIP case.
Three fixture changes plus a baseline refresh:

- Drop the two Devanagari positives. Empirical testing against
  golang.org/x/net/idna v0.53.0 confirms that U+0966..U+096F do not
  fold to ASCII via UTS-46; they pass through Punycode (xn--*) on all
  four profiles. Keeping the cases in would be misleading because the
  query fires structurally without the runtime smuggle ever existing.
- Add two positives covering Profile.ToUnicode on Latin-1 and Math
  superscript inputs. The library runs validateAndMap before the
  encode-vs-decode branch, so ToUnicode produces the same digit-folded
  ASCII output as ToASCII for the in-scope codepoints. The earlier
  fixture only exercised ToASCII despite the model handling both.
- Add three negatives. Two pin the ParseCIDR and ParsePrefix branches
  of the recheck-input predicate, which had no sink-reaching coverage
  before. The third pins the documented exclusion of the package-level
  idna.ToASCII helper against future broadening of the call matcher.

Baseline refreshed via codeql test run --learn after the fixture
changes shifted line numbers and the new select message text replaced
the old one.
@astrogilda
Copy link
Copy Markdown
Author

Pushed two follow-ups for the review pass.

docs: align qhelp/qll/ql with implemented model (6392c52)

  • qhelp recommendation: corrected netip.ParseAddr semantics (parsed via err == nil, not via a non-nil return).
  • qhelp example caption: matches the actual IdnaIpLiteralSmuggleBad.go sample, which shows the no-recheck shape, not a pre-IDNA ParseIP shape.
  • qhelp trim-form list: states explicitly that TrimSuffix and the manual slice form are not equivalent to TrimRight for multi-trailing-dot inputs.
  • qhelp family count: replaced "8 families" with the seven Unicode-block ranges that account for the 100 fold codepoints. Added a note that Devanagari digits are out of scope, verified empirically against golang.org/x/net/idna v0.53.0.
  • qll header: now covers both ToASCII and ToUnicode on the digit-folding profiles, lists the actual idna.New(idna.MapForLookup(), ...) construction shape, and explicitly states that the package-level idna.ToASCII helper is excluded.
  • ql @description and select message: cover both ToASCII and ToUnicode, frame the no-recheck case as the primary anti-pattern.

test: tighten fixture coverage and refresh baseline (771b433)

  • Dropped the two Devanagari positives. Empirical probe against golang.org/x/net/idna v0.53.0: U+0966..U+096F do not fold under any of the four UTS-46 profiles; they pass through Punycode (xn--*). Keeping the cases would have been misleading because the query fired structurally without the runtime smuggle ever existing.
  • Added two ToUnicode positives (Latin-1 superscript on Lookup, Math superscript on Display). The library runs validateAndMap before the encode-vs-decode branch, so ToUnicode produces the same digit-folded ASCII output as ToASCII for in-scope codepoints.
  • Added safe-pattern negatives for net.ParseCIDR and netip.ParsePrefix. Both were modeled in the recheck-input predicate but had no sink-reaching coverage before.
  • Added a negative for the package-level idna.ToASCII helper to pin the documented exclusion against future broadening of the call matcher.

Baseline refreshed via codeql test run --learn after the fixture changes shifted line numbers and the new select message text replaced the old one.

One precision gap I noticed during the round but did not pull into this commit, for transparency: the idnaMappingCall Punycode exclusion is syntactic and would miss aliased forms like pp := idna.Punycode; pp.ToASCII(...). A localFlow-from-idna.Punycode form would catch them. Marginal in practice. I can fold it into a third commit on this branch if you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants